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SKELLAM SHRINKAGE: WAVELET-BASED INTENSITY 
ESTIMATION FOR INHOMOGENEOUS POISSON DATA 

By Keigo Hirakawa and Patrick J. Wolfe* 
Harvard University 

The ubiquity of integrating detectors in imaging and other ap- 
plications implies that a variety of real-world data are well modeled 
as Poisson random variables whose means are in turn proportional 
to an underlying vector-valued signal of interest. In this article, we 
first show how the so-called Skellam distribution arises from the fact 
that Haar wavelet and filterbank transform coefficients correspond- 
ing to measurements of this type are distributed as sums and differ- 
ences of Poisson counts. We then provide two main theorems on Skel- 
lam shrinkage, one showing the near-optimality of shrinkage in the 
Bayesian setting and the other providing for unbiased risk estimation 
in a frequentist context. These results serve to yield new estimators 
in the Haar transform domain, including an unbiased risk estimate 
for shrinkage of Haar-Fisz variance-stabilized data, along with ac- 
companying low-complexity algorithms for inference. We conclude 
with a simulation study demonstrating the efficacy of our Skellam 
shrinkage estimators both for the standard univariate wavelet test 
functions as well as a variety of test images taken from the image 
processing literature, confirming that they offer substantial perfor- 
mance improvements over existing alternatives. 

1. Introduction. Real-world information sensing and transmission de- 
vices are subject to various types of measurement noise; for example, losses 
in resolution (e.g., quantization effects), randomness inherent in the signal 
of interest (e.g., photon or packet arrivals), and variabilities in physical de- 
vices (e.g., thermal noise, electron leakage) can all contribute significantly 
to signal degradation. Estimation of a vector-valued signal / G M''^ given 
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noisy observations g G M therefore plays a prominent role in a variety of 
engineering applications such as signal processing, digital communications, 
and imaging. 

At the same time, statistical modeling of transform coefficients as latent 
variables has enjoyed tremendous popularity across these diverse applications — 
in particular, wavelets and other filterbank transforms provide convenient 
platforms; as is by now universally acknowledged, such classes of transform 
coefficients tend to exhibit temporal and spectral decorrelation and energy 
compaction properties for a variety of data. In this setting, the special case 
of additive white Gaussian noise is by far the most studied scenario, as the 
posterior distribution of coefficients is readily accessible when the likelihood 
function admits a closed form in the transform domain. 

The twin assumptions of additivity and Gaussianity, however, are clearly 
inadequate for many genuine engineering applications; for instance, mea- 
surement noise is often dependent on the range space of the signal /, effects 
of which permeate across multiple transform coefficients and subbands [12]. 
For instance, the number of photoelectrons gi accumulated by the ith el- 
ement of a photodiode sensor array — an integrating detector that "counts 
photons" — is well modeled as a Poisson random variable ~ V{fi), where 
fi is proportional to the average incident photon flux density at the ith 
sensor element. 

Recall that for g^ ~ Pifi) we have that Egi = Yaxgi = fi, and so in 
the case at hand fi reflects (up to quantum efficiency) the ith expected 
photoelectron count, with the resultant "noise" in the form of variability 
being signal-dependent and hence heteroscedastic. Indeed, the local signal- 
to-noise ratio at the ith sensor element is seen to grow linearly with signal 
strength as E gf j Var gi = l + implying very noisy conditions when dealing 
with inefficient detectors or low photon counts. 

Classical variance stabilization techniques dating back to Bartlett and 
Anscombe [1,2,7,8,34,35] yield an approach to Poisson mean estimation 
designed to recover homoscedasticity, with [9] providing a summary of more 
recent work. Here one seeks an invertible operator 7 : R^, typically 

by way of a compressive nonlinearity such as the component-wise square 
root, that (approximately) maps the heteroscedastic realizations of an inho- 
mogeneous Poisson process to the familiar additive white Gaussian setting: 

gi~P(/.),iG{l,2,...,Ar} ^ l{9)^M{-i{f)jN). 

Standard techniques may then be used to estimate 7(/) directly, with the 
inverse transform 7~^(-) applied post hoc. 

Inhomogeneous Poisson data can also be treated directly. For instance, 
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empirical Bayes approaches leverage the independence of Poisson variates 
via their empirical marginal distributions [29, 30], while multiparameter 
estimators borrow strength to improve upon maximum-likelihood estima- 
tion [4,10,16]; however, this ignores potential correlations amongst elements 
of /. To address such concerns, multiresolution approaches to Poisson in- 
tensity estimation were introduced to explicitly encode the dependencies 
between the Poisson variables in the context of Haar frames [22,24,27,33]. 
The relative merits of the various methods described above are well docu- 
mented [3, 19, 34, 35, 37] and will not be repeated here. 

In this paper, we address Poisson rate estimation directly in the Haar 
wavelet and Haar filterbank transform domains by way of the Skellam dis- 
tribution [31], whose use to date has been limited to special settings [17, 
18,20,21,38]. After briefly reviewing wavelet and filterbank coefficient mod- 
els in Section 2, we then describe in Section 3 new Bayesian and frequentist 
transform-domain estimators for both exact and approximate inference. Here 
we first derive posterior means under canonical heavy-tailed priors, along 
with analytical approximations to the optimal estimators that we show to 
be both efficient and practical. We then show how inhomogeneous Poisson 
variability leads to a variant of Stein's unbiased risk estimation [32] for para- 
metric estimators in the transform domain. Simulation studies presented in 
Section 4 verify the effectiveness of our approach, and we conclude with a 
brief discussion in Section 5. 

2. Wavelet and Filterbank Coefficient Models. 

2.1. Haar Wavelet and Filterbank Transforms. Consider a nested se- 
quence of closed subspaces {Vk}k&z of L^(M) satisfying the axioms required 
of a multiresolution analysis [26] . Then there exists a scaling function cj) G 
L2(M) such that the family {2~^/'^4) {2-^{- - i))}iez is an orthonormal ba- 
sis of Vfc for all G Z. There also exist a corresponding conjugate mirror 
filter sequence {hi\i^z and admissible wavelet tp, with Fourier transforms 
/i, tp respectively, satisfying 



Moreover, for any fixed scale 2'^ the wavelet family {2 ^-72^ — ijjiez 
forms an orthonormal basis of the orthogonal complement of Vk in Vk-i, 
and for all {i,k) £ 1? the wavelet families together comprise an orthobasis 
of L2(]R). 



( 



<j){2uj) 



imsart-aos ver. 2009/02/27 file: SkellainShrinkage.tex date: May 29, 2009 



4 



K. HIRAKAWA AND P. J. WOLFE 



Recursively expanding the above K times, and defining 

Wavelet coefficient Xk,i := (/, 2~''/^-0(V2^ - 0) 
Scaling coefficient Sk^i ■= (/, 2~'^'/^0(-/2'^ — i)), 

we see that any / E L^(]R) admits the fohowing orthobasis expansion in 
terms of its wavelet and scaling coefficients: 

oo 

f = 4>{- - i) 

i=—oo 

i=-Oo2 2 V ^ / fc=li=-00 2 2 V ^ / 

The mapping / i— > {sK,i,Xk,i} is termed a K-level continuous wavelet trans- 
form, with an analogous discrete wavelet transform defined for sequences in 

For the special case of a Haar wavelet transform, we take as our scaling 
function (j) = I[o,i] (the unit indicator), with hi = (2~^/^(/)(-/2), — i)) 
yielding ho = hi = 2~^/^ as the only nonzero conjugate mirror filter values. 
This in turn induces a recursive relationship as follows: 

^^-j f Xk,i = Sk-l^2i — Sk-l,2i+l, 

[Sk,i = Sk~l,2i + Sk-l,2i+l- 

In fact, this one-level transform is a version of a filterbank transform — 
a canonical multirate system of the type used for time-frequency analysis 
in digital signal processing. That is, h satisfies the perfect reconstruction 
condition [26] 

h*(uj)h{uj) + h*{uj — 7r)h{uJ — vr) = const, 
h*{uj)h{uj -Tr) + h*{uj - TT)h{uj) = 0. 

In the formulation of (1), each sequence {sk-i^i}i is decomposed into low- 
pass and highpass components {sk,i, Xk^i}i in turn. A recursive application of 
the map {sk-i^i} i— > {sk,i,Xk^i} yields the Haar wavelet transform, whereas 
the same transform applied to highpass component Xk~i^i further decom- 
poses it into narrower bands. Recursive decomposition of both lowpass and 
highpass sequences in this way yields the Hadamard transform, otherwise 
known as the Haar filterbank transform. 
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The low computational requirements of these transforms make them at- 
tractive alternatives to other joint time-frequency analysis techniques pos- 
sessing better frequency localization. The Haar transforms enjoy orthogonal- 
ity, compact spatial support, and computational simplicity, with the Haar 
wavelet transform satisfying the axioms of a multiresolution analysis. We 
later demonstrate how their simplicity serves to admit analytical tractabil- 
ity that in turn enables efficient inference and estimation procedures. 

As a final note, we omit subband index k in the sequel, as wavelet co- 
efficients Xf^^i are always aggregated within a given scale 2^; for notational 
clarity in the finite-dimensional setting, further suppression of subscript i 
will be used to indicate a generic scalar coefficient as distinct from 
vector-valued quantities (e.g., x) indicated in bold throughout. 

2.2. Transform-Domain Denoising. Turning to the problem of transform- 
domain denoising, consider the case whereupon a vector of noisy orthobasis 
coefficients y ~ M{x, g'^In) is observed, with x deterministic but unknown. 
Writing an estimator for x as X(Y) = Y +6iY)^ Stein's Lemma [32] may be 
used to formulate an unbiased estimate of the associated risk E || — ic||2 
as follows. 

Theorem 1 (Stein's Unbiased Risk Estimate (SURE) [32]). Let y ~ 
M{x, a'^I^), with X unknown, and fix an estimator X (Y) = Y + 0{Y) such 
that 6 : —>■ M.^ is weakly differentiable. Then the resultant risk may be 
formulated as 



with Na"^ + \\^{y)\\'2 + 2(T^ div 0(^/) an unbiased estimate thereof. 

Hence, by replacing the latter expectation of (2) with an evaluation over 
the vector y of observed transform coefficients, one may directly optimize pa- 
rameter choices for nonlinear shrinkage estimators — for example soft thresh- 
olding, given by 



As an example that we shall return to later, SUREShrink [6] is obtained 
from (2) and (3) by writing X{Y) = Y + 0{Y; r): 



(2) 



E\\X{Y) -x\\l = Na^ + E \\e{Y)\\l + 2a'^ div e{Y) , 



(3) 
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and thus r is chosen to minimize the empirical risk estimate 

N 

(5) iVa^ + J2 ^Hyf,r^) - 2tT' #{i : \yi\ < r}. 

i=l 

2.3. The Skellam Distribution. In contrast to the above setting of addi- 
tive white Gaussian noise, the distribution of inhomogeneous Poisson data 
9 '■ di ~ T^ifi) is not invariant under orthogonal transformation — and so 
transform-domain denoising ceases to be as straightforward in the general 
setting [12] . However, for the special cases of the Haar wavelet and filterbank 
transforms described in Section 2.1, we may characterize their coefficient dis- 
tributions in closed form as sums and differences of Poisson counts. 

To this end, let the matrix W S {0,ibl}^^^ denote an (unnormalized) 
Haar filterbank transform. Taking x := Wf to be the transform of / G Z^, 
the resultant wavelet and scaling coefficients comprise sums and differences 
of elements of /: 



(6) X := Wf 



{Wavelet coefficient Xi = xf — x] 
Scaling coefficient Si = xf + x^ 



(7) xt:= E Z^' ^^■■= E fr 

r.w,j=i j.w,j=-i 

An analogous definition with respect to the observed data gi ~ V{fi) and 
its Haar filterbank transform y := Wg implies that the empirical wavelet 
and scaling coefficients themselves comprise sums and differences of Poisson 
counts: 



(8) Wg 



{Empirical wavelet coefficient Ui = yf — 
Empirical scaling coefficient ti = yf + y~ ; 
(9) yT^v{xT), ti^V{si). 



Thus the empirical coefficients defined by (8) are effectively corrupted 
versions of those in (6). While the sum of Poisson variates yf and y~ is 
again Poisson, as indicated by the expression of (9) for empirical scaling 
coefficient tj, the distribution of their difference also admits a closed-form 
expression, first characterized by Skellam [31] using generating functions. 

Proposition 2.1. Fix G M+, and let the random variable Y £ Z 

denote the difference of two Poisson variates y^ ~ V{x'^) and y^ ^ V{x^). 
Defining Iy{-) to be the yth-order modified Bessel function of the first kind, 
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we have that 



(10) Pr(y = y;x+,x-) = e-(^^+^ ) j Iy{2^/: 

y G Z; G M+. 

Proof. A direct verification is provided by series representations of Bessel 
functions [11]. First, note that via correlation of Poisson densities we obtain 
directly 

(11) Pr(y = y;x+x-) = e-(-^+-)E Vffc - .V ' 

By change of variables in the summation index of (11) according to max(y, 0) = 
{\y\ + y)/2, we obtain a summand that is symmetric in y G Z as follows: 

(-(-\ 2 cx) / + —\k+ — 
X ) t^Q kl{\y\ + k)\ 

The result follows from the observation that Iu{-) admits, for positive argu- 
ment and order, the real- valued Taylor expansion 

coupled with the fact that I-u{-) = for G N. □ 

We have thus proved that the distribution of each empirical coefficient 
Ui = yf — y~ in (8) may be described as follows. 

Definition 2.1 (Skellam Distribution [31]). Let Y £ Z denote a differ- 
ence of Poisson variates according to (6) -(9), with index i suppressed for 
clarity as in Proposition 2.1. Then 

'KY = x'^ — x~ = X, Var Y = x'^ + x^ = s, 

where s > \x\, and variate y takes the Skellam distribution.- 

y ~ S{x, s); s G M+, —s<x<s 

y 

(12) p{y;x,s) = (^)'/y(v4^^) . 
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(a) Standard Skellam distribution 5(0, 1) 
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(b) Tail behavior with increasing variance 
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(c) Skewness in terms of mean and variance 



Fig 1. Illustrations of the Skellam distribution S{x, s) showing tail behavior and skewness. 
See Defimtion 2. 1 tn the text for details 



Remark 2.1 (Support and Limiting Cases). 

As the difference of two Poisson variates, a Skellam variate ranges over 
the integers unless either x'^,x~ = 0, in which case a direct appeal to the 
discrete convolution of (11) recovers the limiting Poisson cases. On the other 
hand, as both x^,x~ — > oo, it follows from the Central Limit Theorem that 
the distribution of a Skellam variate tends toward that of a Normal. 

Remark 2.2 (Skewness and Symmetry). 

The skewness of a Skellam random variable is easily obtained from its gen- 
erating function as s~^/^x [31], and hence is proportional to the difference 
in Poisson means x"*" and x~ , with a rate that grows in inverse proportion to 
their sum. Indeed, when x = the distribution is symmetric, with variance s 
proportional to the geometric mean of x^ and x~ according to (10). A stan- 
dard 5(0, 1) Skellam random variable is shown in Fig. 1(a), with Fig. 1(b) 
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detailing the tail behavior of other symmetric cases 5(0, s); examples illus- 
trating skewness as a general function of mean and variance are shown in 
Fig. 1(c). 

Returning now to our context of Haar transforms, we next observe that 
the density of empirical coefficient yi depends only on the corresponding 
wavelet and scaling coefficients Xj and Sj (and similarly for the coarsest 
Haar wavelet subband). 

Proposition 2.2. Let m ~ S{xi,Si) according to Definition 2.1, with 
X := Wf a vector of Haar filterbank transform coefficients, and y that of 
the empirical coefficients. Then 

p{yi;f) = p{yi;si,Xi). 

Proof. The relation is a straightforward consequence of the choice of 
transform. From the definitions in (7), 

p{yi;f) = p{yi;xf,Xi) 

= P{yi ; Y.j:Wij=l fj^J2j:Wij=~l fj) 

= p{yi ; Er.w,,=i{w-^x)j,j2j.^w^^^_^{w-^x)^ . 

Let Vi and Wi be row vectors from W such that Si = Vif and Xj = Wif, 
respectively. It is easily verified that the jth entry of {vi + Wi) /2 is nonzero 
if and only if Wij = 1, and hence 

p{y^ ; /) = p{y^ ; (^) w-'x, («) w-'x) 
= piyi;^^,^^) =p{yi;si,xi). 

□ 

3. Wavelet-Domain Poisson Intensity Estimation. Recall our goal 
of leveraging properties of Haar wavelets and filterbanks to accomplish 
transform-domain intensity estimation for inhomogeneous Poisson data. To 
this end there are two main conclusions to be drawn from Section 2.3 above: 
First, Poisson variability in the data domain gives rise to Skellam variability 
in Haar transform domains (Definition 2.1). Second, the conditional indepen- 
dence structure of Haar coefficients suggests univariate Skellam estimators 
as a first step toward achieving satisfactory performance (Proposition 2.2). 

Accordingly, we now turn our attention to deriving univariate Skellam 
mean estimators under both Bayes and frequentist assumptions. We work 
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throughout with the generic scalar quantity Y ~ S{x, s), where Haar scahng 
coefficient s is given and Haar wavelet coefficient x is a latent variable, 
assumed to be random or deterministic depending on context. Although the 
scaling coefficient is not directly observed in practice, this standard wavelet 
estimation assumption amounts to using the empirical scaling coefficient 
ti of (8) as a plug-in estimator of Sj in (6). As Haar scaling coefficients 
constitute sums of Poisson variates in this context, their expected signal-to- 
noise ratios are likely to be high, in keeping with the arguments of Section 1, 
and moreover they admit asymptotic Normality. 

3.1. Key Properties of the Skellam Likelihood Model. We ffi'st develop 
some needed properties of the Skellam likelihood model; while these follow 
from standard recurrence relations for Bessel functions of integral order, 
probabilistic derivations can prove more illuminating. We begin with ex- 
pressions for partial derivatives of the Skellam distribution. 

Property 3.1 (Derivatives of the Skellam Likelihood). Partial deriva- 
tives of the Skellam likelihood p{y ; x, s) admit the following finite-difference 
expressions: 

d 1 

— p(y ; X, s) = - [p{y-l ; x, s) - p{y+l ; x, s)] 
d 1 

— p(y ;x,s) = - [p(y-l ;x,s) + p{y +1 ; x , s)] -p{y;x,s). 
OS 2 

Proof. Recall from Definition 2.1 that a Skellam variate Y ~ 5(x, s) 
comprises the difference of two Poisson variates with respective means x"*" 
and x~ . Denoting by the (conjugate) Fourier transform operator acting 
on the corresponding probability measure, its characteristic function in uj 
follows as 

Tp{y ;x,s) = exp x~^{e^'^ ~ 1) + x^{e^^^ — 1) 



exp 



Ks + x)eJ'^ + ^(s-x)e-J'^-s 



and hence, invoking linearity, we may compute derivatives as: 

^piy;x,s) = :F-^^{:Fp){u;) = ^''l (e^--e-^-) {:Fp), 
and similarly for the partial derivative of p{y ; x, s) in s. □ 
Remark 3.1. 
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Property 3.1 implies that {d / dx)p{y]x^ s) is the normalized first central 
difference of the likelihood on its domain y, and that {d/ds)p{y ; x, s) is one- 
half the normalized second central difference. Hence slope and curvature of 
the likelihood are encoded directly in the Skellam score functions. 

Next, we note that for G N the standard Bessel identity Iu{t) = —2{v — 
1) /t Iy^i(t) + Iu-2ii) implies the following. 

Property 3.2 (Skellam Likelihood Recursion). The Skellam likelihood 
p{y ; X, s) admits the following recurrence relation in y for fixed {x, s): 

—2(y — 1) s + X 

p{y ;x,s) = P(y - 1 ; x, s) H p{y -2; x, s) . 

s — X s — X 

Remark 3.2. 

This property lends itself to easy calculation of the Skellam likelihood, 
as fixed initial values may be tabulated and used to initialize the recursion, 
thus avoiding the evaluation of Bessel functions. 

Combining Properties 3.1 and 3.2, we have our final result. 

Property 3.3 (Skellam Differential Equation). The Skellam likelihood 
p{y ; X, s) satisfies a linear, first-order hyperbolic partial differential equation 
in (x, s), for fixed y, as follows: 

d d 
(13) {y - x) p{y ■x,s) = s g^Piv ■,x,s) + x g^Piv ;x,s). 

3.2. Prior Models and Posterior Inference via Shrinkage. Having devel- 
oped needed properties of the Skellam likelihood p{y ; x, s) above, and with 
s assumed directly observed, we now consider the setting in which each 
underlying transform coefficient x : |x| < s is modeled as a random vari- 
able. While determining the most appropriate choice of prior distribution 
for different problem domains remains an open area of research, with exam- 
ples ranging from generalized Gaussian distributions through discrete and 
continuous scale mixtures, we make no attempt here to introduce new in- 
sights on prior elicitation. Rather, we focus on optimal estimation for general 
classes of prior distributions having compact support. 

The problem being univariate, exact inference is realizable through nu- 
merical methods; however, the requisite determination of prior parameters, 
possibly from data via empirical Bayes, renders this approach infeasible in 
practice, as posterior values cannot be easily tabulated in advance. To this 
end, the main result of this section is an approximate Skellam conditional 
mean estimator with bounded error, obtained as a closed-form shrinkage 
rule. 
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Theorem 2 (Skellam Shrinkage). Consider a Skellam random variable 
Y ~ S{X, s), with s fixed but X a random variable that admits a density with 
respect to Lebesgue measure on [—s,s\. Define the Bayes point estimator 



(14) 



X 



Y - s¥.[^\np{Y\X;s)\Y]s] , 



whence a projection of the score function in x via conditional expectation. Its 
squared approximation error, relative to the conditional expectation Xmmse '■= 
K{X \Y; s), then satisfies 

{XMMSE-Xf <^{x^\Y;s) E(^[§-Jnp{Y\X;s)f\Y;sy 

Proof. Bayes' rule applied to the differential equation of (13) yields the 
necessary conditional expectations, after which Cauchy-Schwarz serves to 
bound its latter term. □ 

While we cannot control the second moment of X conditioned on Y in 
the bound above, its latter term admits by Property 3.1 the equivalence 



E 



^lnp{Y\X;s) \Y;s 



E 



i5^p)iY\X;s) 
2p(Y\X;s) 



Y-s 



where 82 {•) denotes the normalized second central difference in Y, analo- 
gous to a second derivative. This term therefore goes as the square of the 
normalized local curvature in the likelihood at y = y, averaged over the 
posterior distribution of X; it will be small on portions of the domain over 
which the likelihood remains approximately linear for sets of X having high 
posterior probability. 

Theorem 2 thus provides a means of obtaining Bayesian shrinkage rules 
under different choices of prior distribution p(X;s), via evaluation of the 
expectation of (14) as 



(15) 



p{x\y,s)\'_^-E[^lnp{X-s)\Y;s 



While the above formulation is amenable to further approximation via Tay- 
lor expansion (akin to Laplace approximation), we focus here on a direct 
evaluation of E lnp{X; s) \ Y; s^. 

Discounting the former term of (15), which simply measures the difference 
in posterior tail decay at x = is and goes to zero with increasing s, the 
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derivative on [— s, s] is easily computed for the so-called generalized Gaussian 
distribution for p > 0, with location parameter fx and scale parameter ax- 



/I 1 N P 

_j_(\X- fl\'^ 

'C(P) 



with r(-) the Gamma function and CCp) = [r(l/p)/r(3/p)]^/^. 

This distribution being unimodal and symmetric about its mean, we ob- 
tain for /X = the expression 

E lnp(X; s) \Y;s)= (sgn(X)|X|f-i \Y;s), 

from which the Gaussian (p = 2) and Laplacian = 1) cases admit straight- 
forward evaluation. 

Proposition 3.1 (Truncated Normal and Laplace Priors). Let g{x) de- 
note a generalized Gaussian distribution with exponent p > having mean 
zero and variance cx^, and set p{x;s) = g{x)I[_g^g]{x). For Y ~ S{X,s) we 
then have: 

If p = 2 so that p{x ; s) oc e^^ then 

(16 VarX = 2o-^ = a^- , [k 

7(1/2, sV2cj^) ^erf(s/V2cT^) 

OTt/i 7(-,-) i/ie lower incomplete Gamma function, and 

E[^lnp{X;s)\Y;s) = -a~'E{X\Y;s). 

If p = 1 so that p{x ; s) cx e~l'^l/(°"^/^)l[_5^5](x), t/ien 

(17) VarX = -l^J^l^^^ = ^ _ sjs + V-2ax)e-^^^/^^ 
^ ^ 2 7(l,V2s/a^) 1 _e-v^s/<x. 



E 



lnp{X; s) \Y;s)= -^a'^ E(sgn(X) \Y-s). 



Combining Proposition 3.1 with Theorem 2 yields approximate poste- 
rior mean estimators under truncated Gaussian and Laplacian priors. The 
Gaussian case recovers the shrinkage rule 



a1 



(18) X = — ^F, 
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-50 50 

Fig 2. Illustration of the shrinkage implicit m (19) as a function of the posterior distri- 
bution p{y \ x; s) in the case of a Skellam likelihood, showing the contribution of posterior 
mass to shrinkage toward and away from zero 

the optimal linear estimator under a second-moment Normal approximation 
to the Skellam likelihood, with mean zero and variance s. The heavier-tailed 
Laplacian case yields an implicit shrinkage rule illustrated in Fig. 2, whose 
asymptotic behavior in turn enables a simple soft-thresholding rule to be 
fitted: 

(19) X = Y - — [Pr(X >0\Y;s)- Pr(X <0\Y; s)] 

(20) ~ sgn(y) max 

The soft-thresholding estimator of (20) can in turn be adapted to yield a 
piecewise- linear estimator whose slope matches that of (19) at the origin. 
To accomplish this, note that for any prior distribution with even symme- 
try, (12) of Definition 2.1 implies odd symmetry of the posterior expectation 
functional; i.e., E(X \ Y = y; s) = — E(X | Y = —y ; s). Therefore the slope 
of any shrinkage estimator at the origin may be computed as 

^[E{X\Y = l;s) -E{X\Y = -l;s)] = E{X \Y = 1 ; s). 

The slope term E{X | y = 1 ; s) may in turn be pre-computed to arbitrary 
accuracy using numerical methods, and indexed as a function of s and prior 
variance u^, yielding the following piecewise-linear shrinkage estimator: 

(21) X = sgn(y)max(|y|-\/2s/cj^., E{X\Y = l;s) 

Figure 3 in turn compares the exact posterior mean shrinkage rule, corre- 
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Fig 3. Bayesian shrinkage rules corresponding to a Laplacian prior and Skellam likelihood, 
with dotted 45° line shown for reference: Skellam Bayes (SB) MMSE shrinkage rule, com- 
puted numerically; soft-thresholding (SBT) approximation of (20); and piecewise-linear 
(SBL) approximation of (21) 

spending to E(X | Y; s) and computed numerically, with the soft-thresholding 
estimator of (20) and the piecewise-linear estimator of (21). The ideas above 
can be straightforwardly extended to the multivariate case [14] , owing to con- 
ditional independence properties of the Skellam likelihood; derivatives may 
also be computed for the case of mixture priors, though no efficient solution 
is yet known to compute the mixture weights. 

3.3. Parameter and Risk Estimation for Skellam Shrinkage. Having de- 
rived Bayes estimators for the class of unimodal, zero-mean, symmetric pri- 
ors considered above, we now turn to parameter and risk estimation for 
Skellam shrinkage. With only a single observation of each Haar coefficient 
in this heteroscedastic setting, maximum-likelihood methods will simply re- 
turn the identity as a shrinkage rule. However, by borrowing strength across 
multiple coefficient observations we may improve upon the risk properties of 
this approach; as we now detail, this is equally attainable in a frequentist or 
Bayes setting. Here we consider coefficient aggregation within a given scale, 
with notation J2i{')i below indicating summation over location parameter i 
within a single Haar subband. 

The main result of this section is the following theorem, which yields a 
procedure for unbiased i'^ risk estimation in the context of soft thresholding 
and other shrinkage operators. 

Theorem 3 (Unbiased Risk Estimation). Let yi ~ 5(xj,Si) and ti = 
yf + according to (8), with Xi,Si unknown. Fix a vector-valued estimator 



X{Y,T) = Y 0{Y,T), where 6 : x ^ , and let 1 denote the 
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vector of all ones. Then the ^ risk of X{Y,T) may be expressed as follows: 



(22) E\\X{Y,T)-x\\^2 = ^ \\^{Y,T)\\i + \\T\\i + 2{Y,e{Y,T)) 

-{T+Y,e{Y-l,T-l)) + {T-Y,0{Y+l,T-l)) 

with \\e{y, t) \\l+\\t\\i+2{y, e{y, t)) -{t+y, e{y-l, t-l))+{t-y, e{y^l,t-l)) 

an unbiased estimate thereof 

Proof. The risk E \\X{Y,T) - xWl may be expanded as 

(23) E\\e{Y,T)\\l + E\\Y - x\\l + 2E{Y-x,e{Y,T)) , 

with E \\Y - x\\l = Y^iVaiYi = Y,iSi = E \\T\\i. To evaluate the final term 
in (23) above, note first that Y — x = {Y~^ — Y~) — {x~^ — x~) according 
to (6) and (8), and furthermore that Y^ = (T it Y)/2. By conditioning 
on Y+ or Y we in turn obtain Poisson variates, and thus general results 
for discrete exponential families [10, 16] apply, yielding the final relations 
needed to complete the proof of Theorem 3: 

E(F± - x^, e{Y, T)) = E{Y^, 0{Y, T) - OiY =F 1, T - 1)). 

□ 

Parameters of any chosen estimator form X{Y, T) may thus be optimized 
by minimizing the unbiased risk estimate of Theorem 3 with respect to 
observed data vectors y and t. As an important special case, we obtain the 
following corollary. 

Corollary 3.1 (SkellamShrink) . The optimal threshold r for soft thresh- 
olding as Xi(Yi]T) := sgn(l^) max(|l^| — r, 0) is obtained by minimizing 

(24) Y.^gn(\y,\ - T)ti + ^ min(y2, r^) - T#{i : = r}. 

i i 

Remark 3.3. 

Recall the Stein's unbiased risk estimate SUREshrink result [6] for soft 
thresholding in the case of additive white Gaussian noise of variance o"^, as 
described in (3)-(5) of Section 2.2. Recasting the objective function of (5) 
for SUREshrink threshold optimization as 

(25) sgn(|yi| -r)a2+^ min(y2, r^), 

i i 

we see that ti in (24) plays a role analogous to o"^ in the homoscedastic 
SUREShrink setting represented by (25), with the dependence on coefficient 
index i reflecting the heteroscedasticity present in the Skellam likelihood 
case. 
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3.3.1. SkellamShrink with Adjusted Thresholds. We may also consider a 
generalization of the SkellamShrink soft thresholding estimator of Corol- 
lary (3.1), inspired by the Bayes point estimator sgn(yj) max(|yj| — r(si), 0) 
of Theorem 2, in which individual coefficient thresholds depend in general on 
the corresponding scaling coefficient. By treating the quantity ax appearing 
in the Bayesian estimators of Section 3.2 not as a prior variance parame- 
ter, but simply as part of a parametric risk form to be optimized, we may 
appeal directly to the unbiased risk estimation formulation of Theorem 3. 
Since a priori knowledge limitations may well preclude exact prior elicitation 
in practice, this flexible approach provides a degree of robustness to prior 
model mismatch, as borne out by our simulation studies below. 

As an example, consider a shrinkage estimator Xi(Yi,Ti) = yi+9(Yi, Tf, ax) 
that depends on Tj and unknown parameter ax as per the soft thresholding 
formulation of (20): 

-sgn(y,)^T, if|y.|>^r, 

-Yi if m < ^T,; 

Defining ax = ax/V^ and ii = {ti — l)/ax for notational convenience, we 
have the risk estimate 

J2 ss^^ i\yi\ ~ + {yh tl/^l) 

i i 

-2 5: E diti); 

i-\yi\>U/d^ i:\yi\ = \ii-l] i■■\y^\ = lU+^\ 

C{ti) = ti ([til - ti) - \{ii - l)f + {ti - 1) \{ti - 1)1, 

d(t) = [^'^^^'^ ~ ~ + 1)J^ + - 1) L(*i + 1)J if ii/^x > [ii + IJ 
\U{[U\ - U) + [{ii + 1)J ^ - {ii - 1) L(** + 1)J if ti/ax < [ii + Ij . 

with [-J and [•] denoting the floor and ceiling operators, respectively, and 
c{ti) and d{ti) adjusting for the singularity at \yi\ = [Ti] it 1. 

3.3.2. Unbiased Risk Estimates for Variance-Stabilized Shrinkage. The 
strategy outlined above naturally generalizes to any form of parametric esti- 
mator via the unbiased risk estimation formulation of Theorem 3, enabling 
an improvement over the variance-stabilization strategies of Section 1 by 
direct minimization of empirical risk. As a speciflc example, consider the 
Haar-Fisz estimator of [9], in which each empirical Haar wavelet coefficient 
Hi is scaled by the root of its corresponding empirical scaling coefficient as 



(26) 



d{Yi,Ti;ax) 
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Vi '■= Ui/V^ in order to achieve variance stabilization, after which stan- 
dard Gaussian shrinkage methods such as SUREShrink are appHed and the 
variance stabihzation step inverted. 

For the case of nonhnear shrinkage operators, of course, neither the resul- 
tant estimators nor the risk estimates themselves will in general commute 
with this Haar-Fisz strategy, leading to a loss of the unbiasedness property 
of risk minimization — in contrast to the direct application of Theorem 3. 
Taking Haar-Fisz soft thresholding with some fixed threshold r as an ex- 
ample, the equivalent Skellam shrinkage rule is seen to be Xi(Yi,Tf,T) = 



sgn(l^) max(|l^| — \/JiT,0) — with \/Ti in contrast to the scaling of Tj im- 
plied by Theorem 2, as in the adjusted-threshold approach of (26) above. 
In an analogous manner, the corresponding exact unbiased risk estimate 
for this shrinkage rule can in turn be derived directly by appeal to Theo- 
rem 3, rather than relying on the heretofore standard Haar-Fisz approach of 
SUREShrink empirical risk minimization via (25), applied to the variance- 
stabilized coefficients jji. 

3.3.3. Empirical Bayes via Method of Moments. We conclude this sec- 
tion with a simple and effective empirical Bayes strategy for estimating 
scaling coefficients {si} and prior parameter ax for the Bayesian shrinkage 
rules derived in Section 3.2 above. Recall from (9) that = ETj, implying 
the use of the empirical scaling coefficient tj as a direct substitute for Sj in 
the Bayesian setting. Note that Si = \\y--\=i fj for Haar transform matrix 
W, with Ti a corresponding sum of Poisson variates with means fj repre- 
senting the underlying intensities of interest to be estimated. In turn, as the 
sum Si increases, the relative risk K\Ti — Sjp/s? of the plug-in estimator 
Si = Ti will rapidly go to zero precisely at rate 

Next note that under the assumption of a unimodal, zero-mean, and sym- 
metric prior distribution p(X; s), only VarX remains to be estimated. A con- 
venient moment estimator is available, since Tj ~ V{si) and Yi ~ S{xi,Si) 
together imply that VarTj = Varl^ = El^^— Xf, and hence we obtain 
VarX = {l/N) J2iyi ~ ^i- Once estimates VarX and {si} are obtained for 
the coefficient population of interest, the implicit variance equations of (16) 
and (17) may be solved numerically to yield scale parameter ax of the trun- 
cated generalized Gaussian distribution considered earlier, with o"^. = Var X 
in the limit as s grows large. In our simulation regimes, we observed no 
discernable difference in overall wavelet-based estimation performance by 
setting ax = VarX directly. 

4. Simulation Studies. We now describe a series of simulation stud- 
ies undertaken to evaluate the efficacy of the wavelet-based shrinkage esti- 
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mators derived above. We considered exact Skellam Bayes (SB) posterior 
mean estimators, computed numerically with respect to a given prior; the 
Skellam Bayes Gaussian approximation (SBG) linear shrinkage of (18); the 
Skellam Bayes Laplacian soft-thresholding (SBT) approximation of (20); 
the Skellam Bayes Laplacian piecewise- linear (SBL) approximation of (21); 
the SkellamShrink (SS) soft-thresholding estimator with empirical risk min- 
imization of Corollary 3.1; and the SkellamShrink hybrid (SH) adjusted- 
threshold shrinkage of (26). 

Estimators were implemented using a 3-level undecimated Haar wavelet 
decomposition, with empirical risk minimization or the moment methods 
of Section 3.3.3 above used to estimate parameters for the corresponding 
shrinkage rules. As first comparison of relative performance. Figs. 4 and 5 
tabulate results in mean-squared error (MSE) for Skellam likelihood infer- 
ence in cases when the latent variables of interest are drawn from Normal 
and Laplacian distributions with known parameter cr^ G {32,64, 128}. The 
accompanying box plots are shown on a log-MSE scale for visualization pur- 
poses, in order to better reveal differences between estimator performance. 
These figures confirm that exact Bayesian estimators (SB) outperform all 
others, but indicate that prior-specific Skellam Bayes approximations SBG 
and SBL are comparable, respectively, for the Gaussian and Laplacian cases 
over the range of prior parameters shown here. Among soft-thresholding ap- 
proaches, the frequentist SkellamShrink methods SS and SH in turn offer 
improvements over the Bayesian soft-thresholding estimator SBT. 

4.1. Evaluation via Standard Wavelet Test Functions. We next consider 
the standard set of univariate wavelet test functions: "smooth," "blocks," 
"bumps," "angles," "spikes," and "bursts," as illustrated in Fig. 6. A thor- 
ough comparative evaluation of several Poisson intensity estimation meth- 
ods using these test functions is detailed in [3], and here we repeat the 
same set of experiments using the estimators outlined above, along with 
the best-performing methods reviewed in [3] — including variance stabiliza- 
tion techniques currently in wide use as well as the more recent methods 
of [22,33]. To retain consistency with the experimental procedure of [3], 
all methods except for [22] were implemented using a 5-level translation- 
invariant wavelet decomposition; the implementation of [22] provided by [3] 
employs a decomposition level that is logarithmic in the data size, which we 
retained here. 

As can be seen from Fig. 7, the Skellam-based techniques we propose 
here measure well against alternatives despite the diversity of features across 
these test functions, and the corresponding possibilities of model mismatch 
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(a) 


„2 

= 
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(b) = 


64 




MSE 


SS 


SB 


SBG 


SET 


SH 


SS 


SB SBG 


SBT 


SH 


Mean 


30.22 


24.03 


24.03 


31.32 


29.98 


47.75 


39.01 39.01 


53.42 


48.28 


Median 


13.53 


11.14 


11.13 


14.45 


13.88 


21.66 


18.32 18.33 


24.57 


21.95 


Std. Dev. 


44.65 


33.20 


33.20 


43.58 


42.05 


68.22 


54.99 54.99 


73.92 


67.92 
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„2 
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SBG 


SBT 


SH 


Mean 
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56.11 


56.11 


73.76 


67.02 


Median 


29.88 


25.21 


25.20 


34.35 


30.38 


Std. Dev. 


97.04 


80.20 


80.22 


103.29 


96.12 




(a) ai = 32 



(b) = 64 



(c) ai = 128 



Fig 4. Empirical performance as measured by MSE, with x drawn from a truncated Normal 
distribution and scaling coefficient s fixed to 100 
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13.84 
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59.63 


54.51 


55.18 


64.19 


59.91 












Median 
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Fig 5. Empirical performance as measured by MSE, with x drawn from a truncated Lapla- 
cian distribution and scaling coefficient s fixed to 100 
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with respect to any assumed prior distribution of wavelet coefficients. Over- 
all, it can be seen that only the multiscale model of [22] offers comparable 
performance. 

4.2. Error and Perceptual Quality for Standard Test Images. We now 
consider an image reconstruction scenario using a test set of well-known 
8-bit gray scale test images that feature frequently in the engineering lit- 
erature: "Barbara," "boat," "clown," "fingerprint," "house," "Lena," and 
"peppers." Corresponding pixel values are considered as the true under- 
lying intensity function of interest; both noise level characterization and 
reconstruction results are reported in terms of signal-to-noise ratio (SNR) 
in decibels, a quantity proportional to log-MSE. By way of competing ap- 
proaches we consider [6,22,28,33], with [6,28] used in conjunction with the 
variance stabilization methods of [1,9]. Implementations were set at an equal 
baseline implementation comprising a 3-level undecimated Haar wavelet de- 
composition, with no a priori neighborhood structures assumed amongst the 
coefficients. 

The performance of the Skellam methods proposed here offers noticeable 
improvements over alternative approaches, in terms of visual quality (Fig. 8), 
mean-squared error (Table 1), and perceptual error (Table 2). In terms of 
visual quality, we have generally observed that the proposed Skellam Bayes 
approaches yield restored images in which the spatial smoothing is appro- 
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(a) Smooth 




SS SBL SBT SH AUH AUS CH CS K TN 



(b) Blocks 




SS SBL SBT SH AUH AUS CH CS K TN 




SS SBL SBT SH AUH AUS CH CS K TN 



(c) Bumps 



(d) Angles 



14 

111 

212 





SS SBL SBT SH AUH AUS CH CS K TN 



(e) Spikes 




SS SBL SBT SH AUH AUS CH CS K TN 



(f) Bursts 



Fig 7. Mean-squared error, averaged over 100 trials, corresponding to reconstruction of 
the prototype functions of Fig. 6. (Note the difference m scale across the figure panels.) 
Skellam-based approaches comprise the left-hand portion of each figure panel; AUH/AUS 
denotes Anscombe variance stabilization [1] with hard/soft universal thresholding [5J; 
CH/CS denotes corrected hard/soft thresholding [23]; K indicates the multiscale model of 
Kolaczyk [22]; and TN is the multiscale multiplicative innovation model of Timmermann 
& Nowak [33] 
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(c) SkellamShrink (SS) (d) Bayes exact posterior mean (SB) 



(e) Bayes Laplacian thresholding (SBT) (f) Adjusted-threshold hybrid (SH) 




(g) Haar-Fisz [9] with [28] (h) Multiscale multiplicative model [33] 



Fig 8. Performance comparison of the wavelet-based estimators derived in Section 3 rel- 
ative to existing approaches, shown for the "clown" test image 
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Table 1 

Average reconstruction SNR (dB) for a set of standard test images 





SNR 


SS 


SB 


SBL 


SBT 


SH 


[1,6] 


[6,9] 


[22] 


[33] 


Mean 




14.63 


15.56 


15.55 


15.53 


15.67 


7.13 


6.68 


9.25 


15.42 


Median 




14.28 


15.24 


15.26 


15.31 


15.35 


6.35 


6.67 


9.30 


15.30 


Min 


dB 


12.67 


13.51 


13.42 


13.00 


13.39 


6.23 


6.39 


8.73 


12.52 


Max 




16.38 


17.69 


17.70 


17.82 


17.89 


9.20 


7.11 


9.49 


17.76 


Std. Dev. 




1.38 


1.62 


1.64 


1.78 


1.70 


1.23 


0.30 


0.25 


1.90 


Mean 




15.96 


16.71 


16.69 


16.64 


16.85 


10.36 


9.79 


10.99 


16.39 


Median 




15.62 


16.08 


16.10 


16.08 


16.17 


10.59 


9.90 


11.10 


16.05 


Min 


3 dB 


14.28 


14.73 


14.68 


14.27 


14.80 


8.86 


8.97 


10.25 


13.15 


Max 




18.24 


19.02 


19.05 


19.15 


19.26 


10.90 


10.15 


11.34 


19.00 


Std. Dev. 




1.53 


1.75 


1.78 


1.92 


1.84 


0.71 


0.42 


0.35 


2.15 


Mean 




19.78 


19.81 


19.77 


19.80 


20.17 


16.36 


16.36 


17.15 


19.46 


Median 




19.62 


19.48 


19.50 


19.76 


20.35 


16.77 


16.63 


17.62 


20.22 


Min 


10 dB 


17.58 


17.68 


17.51 


17.46 


17.71 


15.22 


15.41 


15.82 


16.36 


Max 




22.27 


22.23 


22.22 


22.34 


22.61 


16.87 


17.15 


18.20 


22.20 


Std. Dev. 




1.89 


1.86 


1.89 


2.01 


1.98 


0.65 


0.66 


0.95 


2.37 



Table 2 

Average reconstruction S SIM for images at 0, 3, and 10 dB SNR 





Image 


SS 


SB 


SBL 


SBT 


SH 


[1,6] 


[6,9] 


[22] 


[33] 


Mean 


0.050 


0.463 


0.523 


0.523 


0.538 


0.547 


0.172 


0.132 


0.214 


0.524 


Median 


0.048 


0.439 


0.537 


0.540 


0.570 


0.572 


0.140 


0.118 


0.200 


0.563 


Min 


0.026 


0.398 


0.471 


0.474 


0.396 


0.461 


0.084 


0.076 


0.124 


0.323 


Max 


0.083 


0.571 


0.581 


0.584 


0.613 


0.614 


0.318 


0.247 


0.404 


0.604 


Std. Dev. 


0.024 


0.057 


0.044 


0.047 


0.078 


0.058 


0.086 


0.063 


0.096 


0.099 


Mean 


0.088 


0.520 


0.571 


0.574 


0.600 


0.607 


0.257 


0.207 


0.260 


0.574 


Median 


0.083 


0.512 


0.585 


0.584 


0.610 


0.606 


0.247 


0.196 


0.244 


0.614 


Min 


0.045 


0.425 


0.504 


0.510 


0.523 


0.546 


0.159 


0.124 


0.158 


0.378 


Max 


0.149 


0.611 


0.628 


0.635 


0.674 


0.670 


0.383 


0.321 


0.446 


0.660 


Std. Dev. 


0.041 


0.061 


0.043 


0.043 


0.060 


0.048 


0.083 


0.077 


0.102 


0.098 


Mean 


0.261 


0.698 


0.683 


0.693 


0.731 


0.731 


0.474 


0.465 


0.516 


0.707 


Median 


0.228 


0.694 


0.676 


0.692 


0.752 


0.748 


0.459 


0.440 


0.501 


0.707 


Min 


0.151 


0.628 


0.593 


0.602 


0.646 


0.651 


0.335 


0.341 


0.404 


0.645 


Max 


0.431 


0.786 


0.791 


0.791 


0.789 


0.807 


0.641 


0.634 


0.666 


0.778 


Std. Dev. 


0.108 


0.053 


0.063 


0.060 


0.056 


0.056 


0.121 


0.116 


0.103 


0.055 
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priately locally adaptive — for example, these methods yield effective noise 
attenuation in both bright (see forehead) and dark (see black background) 
regions of the example image shown in Fig. 8. A comparison of Figs. 8(c) 
and (f) reveals the importance of incorporating the scaling coefficient s ex- 
plicitly in the estimator; images processed via SBT tended to be similar to 
those for which SH was used, but with softer edges. In comparison, meth- 
ods based on variance stabilization typically fail to completely resolve the 
heteroscedasticity of the underlying process, as evidenced by the under- and 
over-smoothed noise in bright regions such as the forehead and hair tex- 
tures of Fig. 8(g). The Bayesian method of [33] typically yields far smoother 
output images, in which texture information is almost entirely lost; see, for 
example, the hair in Fig. 8(h). With the exception of SS, Skellam-based es- 
timation methods suffer considerably less from the reconstruction artifacts 
typically associated with wavelet-based denoising, as can be seen in the cheek 
structure of the "clown" image. 

We also report numerical evaluations of estimator performance in this 
setting, by way of both SNR in Table 1 and the widely-used perceptual er- 
ror metric of Structural Similarity Index (SSIM) [36] in Table 2, for input 
SNR of 0, 3, and 10 dB. The results readily confirm that Skellam-based 
approaches outperform competing alternatives, with only that of [33] re- 
maining competitive — though as described above, its oversmoothing results 
in a great deal of loss of texture. The SkellamShrink adjusted-threshold 
hybrid (SH) method measures the best in terms of both SNR and SSIM, 
with other Skellam-based approaches generally outperforming all alterna- 
tives save for [33]. 

5. Discussion. In this article we derived new techniques for wavelet- 
based Poisson intensity estimation by way of the Skellam distribution. Two 
main theorems, one showing the near-optimality of Bayesian shrinkage and 
the other providing for a means of frequentist unbiased risk estimation, 
served to yield new estimators in the Haar transform domain, along with 
low-complexity algorithms for inference. A simulation study using standard 
wavelet test functions as well as test images confirms that our approaches 
offer appealing alternatives to existing methods in the literature — and in- 
deed subsume existing variance-stabilization approaches such as Haar-Fisz 
by yielding exact unbiased risk estimates — along with a substantial improve- 
ment for the case of enhancing image data degraded by Poisson variability. 
We expect further improvements for specific applications in which correla- 
tion structure can be assumed a priori amongst Haar coefficients, in a man- 
ner similar to the gains reported by [28] for the case of image reconstruction 
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in the presence of additive noise. 
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